AITopics | out-of-domain data

Collaborating Authors

out-of-domain data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis

Li, Yangle, Luo, Danli, Hu, Haifeng

arXiv.org Artificial IntelligenceDec-9-2025

Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific representations that exist beyond the confines of unimodal. To address these limitations, we propose a novel MSA framework designed for domain generalization. Firstly, the framework incorporates a Mixture of Invariant Experts model to extract domain-invariant features, thereby enhancing the model's capacity to learn synergistic relationships between modalities. Secondly, we design a Cross-Modal Adapter to augment the semantic richness of multimodal representations through cross-modal knowledge injection. Extensive domain experiments conducted on three datasets demonstrate that the proposed MIDG achieves superior performance.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2512.0743

Country:

North America (0.28)
Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.74)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach

Wu, Yurong, Mu, Fangwen, Zhang, Qiuhong, Zhao, Jinjing, Xu, Xinrun, Mei, Lingrui, Wu, Yang, Shi, Lin, Wang, Junjie, Ding, Zhiming, Wang, Yiwei

arXiv.org Artificial IntelligenceFeb-20-2025

Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism, a prompt-stealing benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels. To identify the vulnerabity of VLMs to prompt stealing, we propose EvoStealer, a novel template stealing method that operates without model fine-tuning by leveraging differential evolution algorithms. The system first initializes population sets using multimodal large language models (MLLMs) based on predefined patterns, then iteratively generates enhanced offspring through MLLMs. During evolution, EvoStealer identifies common features across offspring to derive generalized templates. Our comprehensive evaluation conducted across open-source (INTERNVL2-26B) and closed-source models (GPT-4o and GPT-4o-mini) demonstrates that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects, significantly outperforming baseline methods with an average improvement of over 10%. Moreover, our cost analysis reveals that EvoStealer achieves template stealing with negligible computational expenses. Our code and dataset are available at https://github.com/whitepagewu/evostealer.

evostealer, prompt template, template, (15 more...)

arXiv.org Artificial Intelligence

2502.14285

Country: North America > United States > California (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Yang, Jiange, Zhu, Haoyi, Wang, Yating, Wu, Gangshan, He, Tong, Wang, Limin

arXiv.org Artificial IntelligenceNov-21-2024

Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2411.14519

Country:

North America > Montserrat (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Partially-Supervised Image Captioning

Neural Information Processing SystemsOct-8-2024, 05:13:01 GMT

The authors tackle the problem of image captioning in the setting where captions may not be available for novel' categories. This setting is well motivated by the authors and is a practical one - it is easier to scale image-level annotations than it is to get image captions. They present an approach that uses some paired image captions and visual labels of novel categories to caption novel categories. The visual labels of novel categories are combined using an automata to only accept captions that contain some mention of the new classes. The method is evaluated on the novel class split proposed by Hendricks et al.

novel category, partially-supervised image captioning, visual label, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.62)

Add feedback

Song Emotion Classification of Lyrics with Out-of-Domain Data under Label Scarcity

Sakunkoo, Jonathan, Sakunkoo, Annabella

arXiv.org Artificial IntelligenceOct-8-2024

Songs have been found to profoundly impact human emotions, with lyrics having significant power to stimulate emotional changes in the audience. There is a scarcity of large, high quality in-domain datasets for lyrics-based song emotion classification (Edmonds and Sedoc, 2021; Zhou, 2022). It has been noted that in-domain training datasets are often difficult to acquire (Zhang and Miao, 2023) and that label acquisition is often limited by cost, time, and other factors (Azad et al., 2018). We examine the novel usage of a large out-of-domain dataset as a creative solution to the challenge of training data scarcity in the emotional classification of song lyrics. We find that CNN models trained on a large Reddit comments dataset achieve satisfactory performance and generalizability to lyrical emotion classification, thus giving insights into and a promising possibility in leveraging large, publicly available out-of-domain datasets for domains whose in-domain data are lacking or costly to acquire.

classification, emotion, lyric, (12 more...)

arXiv.org Artificial Intelligence

2410.05778

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media (0.74)
Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes

Chen, Yu-Wen, Hirschberg, Julia

arXiv.org Artificial IntelligenceJun-4-2024

Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.

category, dataset, information, (15 more...)

arXiv.org Artificial Intelligence

2406.02826

Country: North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

Add feedback

Singer Identity Representation Learning using Self-Supervised Techniques

Torres, Bernardo, Lattner, Stefan, Richard, Gaël

arXiv.org Artificial IntelligenceJan-10-2024

Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks and apply data augmentations during training to ensure that the representations are invariant to pitch and content variations. We evaluate the quality of the resulting representations on singer similarity and identification tasks across multiple datasets, with a particular emphasis on out-of-domain generalization. Our proposed framework produces high-quality embeddings that outperform both speaker verification and wav2vec 2.0 pre-trained baselines on singing voice while operating at 44.1 kHz. We release our code and trained models to facilitate further research on singing voice and related areas.

dataset, representation, singer identification, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.10265323

2401.05064

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Media > Music (0.69)
Leisure & Entertainment (0.69)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
(3 more...)

Add feedback

JCoLA: Japanese Corpus of Linguistic Acceptability

Someya, Taiga, Sugimoto, Yushi, Oseki, Yohei

arXiv.org Artificial IntelligenceSep-22-2023

Neural language models have exhibited outstanding performance in a range of downstream tasks. However, there is limited understanding regarding the extent to which these models internalize syntactic knowledge, so that various datasets have recently been constructed to facilitate syntactic evaluation of language models across languages. In this paper, we introduce JCoLA (Japanese Corpus of Linguistic Acceptability), which consists of 10,020 sentences annotated with binary acceptability judgments. Specifically, those sentences are manually extracted from linguistics textbooks, handbooks and journal articles, and split into in-domain data (86 %; relatively simple acceptability judgments extracted from textbooks and handbooks) and out-of-domain data (14 %; theoretically significant acceptability judgments extracted from journal articles), the latter of which is categorized by 12 linguistic phenomena. We then evaluate the syntactic knowledge of 9 different types of Japanese language models on JCoLA. The results demonstrated that several models could surpass human performance for the in-domain data, while no models were able to exceed human performance for the out-of-domain data. Error analyses by linguistic phenomena further revealed that although neural language models are adept at handling local syntactic dependencies like argument structure, their performance wanes when confronted with long-distance syntactic dependencies like verbal agreement and NPI licensing.

east asian ling, language model, linguistic phenomenon, (13 more...)

arXiv.org Artificial Intelligence

2309.12676

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Tōhoku (0.07)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
(16 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)

Add feedback

A New AI Research Introduces CACTI: A Framework For Multi-Task Multi-Scene Robot Manipulation - MarkTechPost

#artificialintelligenceMar-29-2023, 19:15:33 GMT

Recent advances in learning-based control have brought us closer to the objective of building an embodied agent with generalizable human-like abilities. Natural language processing (NLP) and computer vision (CV) have come a long way, thanks in large part to the availability of structured datasets on a massive scale. Web-scale datasets with high-quality photos and text have demonstrated significant improvements using the same fundamental methods. Nevertheless, gathering data on a comparable scale for robot learning is impossible due to logistical difficulties. Collecting demonstrations via teleoperation is laborious and time-consuming compared to the plethora of online textual and visual data.

marktechpost, multi-task multi-scene robot manipulation, new ai research introduce cacti, (7 more...)

#artificialintelligence

Genre: Research Report (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.43)

Add feedback

3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection

Lehner, Alexander, Gasperini, Stefano, Marcos-Ramiro, Alvaro, Schmidt, Michael, Mahani, Mohammad-Ali Nikouei, Navab, Nassir, Busam, Benjamin, Tombari, Federico

arXiv.org Artificial IntelligenceMay-3-2022

As 3D object detection on point clouds relies on the geometrical relationships between the points, non-standard object shapes can hinder a method's detection capability. However, in safety-critical settings, robustness to out-of-domain and long-tail samples is fundamental to circumvent dangerous issues, such as the misdetection of damaged or rare cars. In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training. We achieve this with 3D-VField: a novel data augmentation method that plausibly deforms objects via vector fields learned in an adversarial fashion. Our approach constrains 3D points to slide along their sensor view rays while neither adding nor removing any of them. The obtained vectors are transferable, sample-independent and preserve shape and occlusions. Despite training only on a standard dataset, such as KITTI, augmenting with our vector fields significantly improves the generalization to differently shaped objects and scenes. Towards this end, we propose and share CrashD: a synthetic dataset of realistic damaged and rare cars, with a variety of crash scenarios. Extensive experiments on KITTI, Waymo, our CrashD and SUN RGB-D show the generalizability of our techniques to out-of-domain data, different models and sensors, namely LiDAR and ToF cameras, for both indoor and outdoor scenes. Our CrashD dataset is available at https://crashd-cars.github.io.

artificial intelligence, deformation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CVPR52688.2022.01678

2112.04764

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.93)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback